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ABSTRACT 

The most useful method of item selection for making 
pass-fail decisions with a Computerized Adaptive Test (CAT) was 
studied. Medical technology students (n=86) took a computer adaptive 
test in which items were targeted to the ability of the examinee. The 
adaptive algorithm that selected items and estimated person measures 
used the Rasch model and a version of maximum likelihood estimation. 
The stopping rule was based on confidence in the pass/fail decision. 
Results indicate that when test length is sufficient, targeting items 
at the ability of the examinee and using a confidence level stopping 
rule results in the most efficient computer adaptive test for making 
a pass/fail decision. Examinees whose ability is clearly above or 
below the pass/fail point then take a minimum number of items, but 
those whose ability is near the pass point take a test of precision 
comparable to a test of items targeted at the pass/fail point. An 
appendix contains an examinee map for the test and a map key. 
(Contains two tables, three figures, and six references.) (SLD) 
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Comparison of Item Targeting Strategies 
for Computer Adaptive Tests 

When a computer adaptive test (CAT) is used to make pass/fail decisions, 
there are two schools of thought about where items should be targeted. 

One point of view, expressed in Kingsbury and Houser (1990), is that 
items should be targeted to the estimated ability of the examinee. When items 
are targeted to the ability of the examinee, the information gained from each 
item is maximized. An adaptive test that provides maximum information should 
provide the clearest indication of the person's position above or below the 
pass point. When information is maximized, the standard error of measure is 
minimized and thus a pass/fail decision can be made with fewer items. 

Targeting items to the estimated ability of the examinee is more efficient for 
examinees with abilities well above or below the pass point and should be 
equally efficient for examinees near the pass point. 

A second point of view, expressed by Wainer (1990), is that the most 
efficient method for determining if a particular person's position is above or 
below the pass point is to present items whose difficulty matches that of the 
pass point. The test is adaptive in the sense of the stopping rule 
implemented. The computer algorithm allows the test to continue until a 
specified standard error of measure or a specified level of confidence in the 
pass/fail decision is reached. Vainer believes that the testing process is 
more efficient when test items center on the pass point and that it is easier 
to construct an item pool around one level of difficulty rather than across a 
range of difficulties. In practice, when computer adaptive tests are targeted 
to examinee ability and examinee ability is not known, items at the beginning 
of a computer adaptive test are often poorly targeted. Targeting items to the 



pass point means that examinees whose ability lies near the pass point receive 
a test closely targeted to their ability even at the beginning of the test. 
Since these examinees are the most difficult to make a decision about, 
targeting items at the pass point will always result in the optimal test for 
them. 

Factors influencing the usefulness of one targeting procedure over the 
other include the distribution of the examinee population, the stopping rule 
implemented and the length of the test. Common sense dictates that if the 
distribution of the examinee population is homogeneous with the median near 
the pass point there is little reason to target items on ability. For min imum 
competency tests, however, examinee abilities may be skewed toward the upper 
end of the distribution. In this case, or in the case of a widely dispersed 
population, it may be more useful to target to ability since more examinees 
will take shorter tests. 

Test length is an important consideration in the choice of a targeting 
procedure. Short computer adaptive tests, targeted to examinee ability, may 
result in misclassification if examinee ability is not well estimated at the 
beginning of the test or if examinees fail to respond according to model 
expectations early in the test. An example of this is the high anxious 
examinee who incorrectly answers several items at the beginning of the test. 

The potential advantage of one procedure over the other is determined by 
the precision with which examinees are measured and the accuracy of the 
pass/fail decisions rendered. The purpose of this paper is to demonstrate 
that the most useful method of item selection for making pass/fail decisions 
with a CAT depends on the above mentioned factors . 
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Method 



Medical technology students from across the country took a computer 
adaptive test in which the items were targeted to the ability of the examinee. 
Eighty-six students are included in this study. Other studies using other 
subsets of students and test conditions are reported elsewhere. 

Test Specifications 

The adaptive algorithm which selected items and estimated person 
measures used the Rasch model (Rasch, 1960/1980) and the PROX version of 
maximum likelihood estimation (Wright and Stone, 1979). The stopping rule was 
based on confidence in the pass/fail decision. The test stopped when the 
examinee's estimated ability measure was either 1.3 times the error of measure 
above the pass point (a clear pass--one tailed 90% confidence interval), or 
1.3 times the error of measure below the pass point (a clear fail), or when a 
maximum test length of 100 items was reached. Minimum test length was 50 
items and the pass/fail point was set at .15 logits on the scale. 

The CAT ADMINISTRATOR (Gershon, 1989) constructed computer adaptive 
tests following the content specifications of the traditional paper and pencil 
certification examination (See Table 1). In the first 50 items, blocks of ten 
items were administered from subtests 1-4 and blocks of 5 items were 
administered from subtests 5 and 6. After 50 items, blocks of 4 items 
(subtests 1-4) and blocks of 2 items (subtests 5 and 6) were administered. 
Subtest order was selected randomly by the computer algorithm. Items were 
chosen at random from unused items within .10 logits of the targeted item 
difficulty within the specified content area. 

All examinees started the test with an item whose difficulty was near 
the pass point (between -.5 and .5 logits). Items were targeted so that 
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examinees had a 50% probability of correct response and the pass/fail decision 
was based on the final estimated ability measure. Using the PROX version of 
maximum liklihood estimation, examinee ability cannot be estimated until the 
examinee answers at least one item correctly and one item incorrectly. The 
steps ize for selecting the difficulty of the next item presented, before the 
examinee ability could be estimated, was 1.00 logit. 

Comparison of Targeting 

Theoretical standard errors of measure were calculated for examinee 
abilities from -2.00 logits to +2.00 logits at .05 logit intervals for tests 
of 50 and 100 items (Wright and Stone, 1979). Theoretical tests were targeted 
to the pass point (.15), and to examinee ability. The theoretical standard 
errors of measure were compared with observed standard errors of measure from 
the computer adaptive test which was targeted to the estimated ability of the 
examinee . 

Effect of Test Length on Pass/Fail Decisions 

To examine the impact of giving a short CAT, pass/fail classifications 
made at 20 items were compared with pass/fail classifications made at the end 
of the actual CAT. Only examinees for whom a clear decision (90% confidence) 
was made in less than 100 items were included in this analysis (N-65). 

Results 

Precisio n of Measurement 

Figure 1 shows that for perfectly targeted 50 item fixed length computer 
adaptive tests, targeting to the ability of the examinee (SEM-.28) produces 
lower standard errors of measure than targeting to the pass/fail point (SEM 
ranges from .28 to .45 logits) for all examinees except those whose ability is 
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at the pass/fail point. Examinees at the pass/fail point have the same SEM 
regardless of which item selection method is used. 

In actual testing situations, perfect targeting is not possible and less 
than perfect targeting will result in an increased SEM. Figure 2 shows that 
in the actual CAT, at 50 items, targeting to the ability level of the examinee 
resulted in a slightly larger SEM for examinees near the pass/fail point than 
if items had been targeted at the pass/fail point. However, for most 
examinees, especially those whose ability is relatively far from the pass/fail 
point, the SEM is considerably smaller than would have been attained if the 
items had been targeted at the pass/fail point. These examinees took a more 
efficient test when items were targeted to their current estimated ability 
than they would have if the items had been targeted to the pass point. 

Figure 3 shows the SEM at 100 items for examinees whose ability measure 
is very near the pass/fail point and thus took a maximum length test. The 
mean SEM for these 21 examinees is .202 with a standard deviation of .003. 
Since a perfectly targeted test would yield a SEM of .20, for all practical 
purposes, the increase in SEM due to poor targeting early in the test has 
"washed out". For these examinees, targeting either to the pass point or to 
their ability will provide a comparable result. 

Accuracy of the P ass/Fail Decision 

When items are targeted to the ability of the examinee, short computer 
adaptive tests may result in misclassification if the examinee does not 
respond according to model expectations at the beginning of the test. The 
examinee map (Gershon, 1989) in the Appendix shows an example of an examinee 
with a poor start. This examinee missed the first two items and, due to the 
1.00 logit stepsize, the difficulty of the third item presented is -1.99 



logits. His ability estimate at this point is -1.62. If the test had stopped 
at 20 items, the examinee would have failed. However his test map shows that 
he gradually recovers from his initial poor start and at 97 items passes the 
test with 90% confidence in the decision. 

In the observed CAT, thirty-nine (39/86) examinees passed or failed the 
test with 90% or greater confidence in the accuracy of the decision in a 
minimum test length of 50 items. Twenty-six (26/86) examinees passed or 
failed the test in 51 to 99 items with 90% confidence in the accuracy in the 
decision. Twenty-one (21/86) examinees whose measure was near the pass/fail 
point took the maximum test length of 100 items and a pass/fail decision was 
made with less than 90% accuracy. Thus for 65 of the examinees a clear 
pass/fail decision was reached. Table 2 compares the final pass/fail results 
with pass/fail results had the test been stopped at 20 items for these 65 
examinees. A different pass/fail decision would have been made for 7 (7/65 or 
11%) of these examinees had the test been as short as 20 items. 

Discussion 

Targeting on Ability 

If test length is sufficient, targeting items at the ability of the 
examinee and using a confidence level stopping rule results in the most 
efficient computer adaptive test for making a pass/fail decision. Examinees 
whose ability is clearly above or below the pass/fail point take a minimum 
number of items. Examinees whose ability is near the pass point take a test 
of comparable precision to a test comprised of items targeted at the pass/fail 
point. 
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When targeting items to ability, one possible procedure to lessen the 
effects of a poor start is to reduce the stepsize (used until maximum 
liklihood ability estimates can be calculated) to a small value (.10 to .20 
logits). Another possible procedure is to constrain the difficulty of the 
first 5 to 10 items to a specified range (e.g. t .10 logits of the pass/fail 
point or ± .10 logits of the previous item difficulty) rather than Just 
starting the test with the first item at the pass point. This would limit the 
possible negative effect of early mistargeting for examinees whose final 
measure is near the pass point. 

In this study the procedure for administering subtests in blocks may 
have contributed to inaccuracy of decision at 20 items. If an examinee's 
ability was inconsistent across subtests, his performance on the first subtest 
had a great impact on the pass/fail decision at 20 items. For example, an 
examinee who did very well on subtest 1, but performed poorly on other areas 
of the test, would have passed at 20 items but failed the test. A better 
procedure might be to distribute items across subtests rather than in blocks 
or to use smaller blocks of items for each subtest. 

Targeting on the Pass /Fail Point 

If a computer adaptive test is a short test, placing items at the 
pass/fail point and using a stopping rule that requires a specified level of 
precision (SEM) may be a useful combination of procedures. While examinees 
whose ability is far from the pass/fail point may take additional items to 
reach the specified SEM, examinees near the pass/fail point, for whom the 
decision is most difficult to make, will be presented with well targeted items 
even at the start of the test and will thus be measured with the greatest 
precision. 
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Map Key 



A, Summary Statistics 

1. Total Test/Sub test 
T - total 
M - microbiology 
BB - blood banking 
C - chemistry 
H - hematology 
BF - body fluids 
I - immunology 

B. Number of items 

C. Number of items correct 

D. Number of items incorrect 

E. Ability measure 

F. Error of measure 

G. Average item difficulty 

H. Sum of the squares (item difficulty) 

I. Item number 

J. Subtest identifier 

K. Item difficulty 

L. Response selected 

M. Right/vrong O-incorrect 1-correct 

N. Current estimated ability measure 

O. Current estimated error 

P. Time (in seconds) 

Pass/fail point (.15 logits) 
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Table 1 

Item Bank Description 



Test Plan 

Subtest Distribution* 


Number of 
Items 
in Bank 


Easiest 

Item 


Mean 


Hardest 

Item 


SD 


Microbiology 


20% 


147 


-2.89 


-.06 


2.38 


.96 


Blood Banking 


20% 


165 


-2.21 


-.07 


2.94 


1.00 


Chemistry 


20% 


142 


-3.61 


-.07 


2.97 


1.06 


Hematology 


20% 


135 


-2.80 


-.05 


2.97 


.97 


Body Fluids 


10% 


72 


-2.24 


-.09 


3.84 


.97 


Immunology 


10% 


65 


-2.78 


.25 


2.04 


.96 


Bank Scale 


100% 


726 


-3.61 


CM 

O 

i 


3.84 


1.00 



* The test plan distribution for computer adaptive tests was the same as 
the test plan for the traditional fixed length written certification 
examination. 
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Table 2 



Pass/Fail Consistency * 
Comparison of Decision after 20 Items 
and Final Decision 



Decision at 20 Items 



Pass Fail 



Final 

Decision Pass 



Fail 



38 


5 


2 


20 



N=65 



* For examinees for whom a clear (90% confidence) 
decision was reached. 11% of the examinees 
would have been affected by a short test. 
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